Demand Planning Analysis with R

Data Transformation Part 1

code
analysis
Author

Richard A. Maestas

Published

May 24, 2025

Translating Excel VLOOKUPs to Joins in R

Introduction

If you’re a supply chain professional or small business owner still relying heavily on Excel for demand planning, it’s time to consider upgrading your toolkit. This post is part of a series exploring how R—a powerful, open-source programming language—can enhance your demand planning processes.

In this installment, we focus on one of the most common Excel tasks: the VLOOKUP. We’ll demonstrate how to replicate and improve this functionality using joins in R with the dplyr package. This approach increases efficiency, ensures data consistency, and sets the stage for more advanced analytics.

Use Case Overview: Preparing Sales Data for Analysis

Using data from Jungle Scout, a market research platform, I collected daily sales data for various Amazon yoga products. The data can be found here. While the software allows us to export product attributes and daily sales, the data comes in separate files and lacks a shared structure.

To analyze this effectively for demand planning, we need to combine these data sets into a single table, similar to how you’d use VLOOKUPs in Excel.

Step 1: Load the Required Libraries

First, let’s load the tools we’ll need.

library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.5.1     ✔ purrr   0.3.5
✔ tibble  3.2.1     ✔ dplyr   1.1.4
✔ tidyr   1.2.1     ✔ stringr 1.5.0
✔ readr   2.1.3     ✔ forcats 0.5.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
#optional
library(broom)
library(kableExtra)

Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':

    group_rows

The tidyverse is an essential R package for data wrangling, while broom and kableExtra are optional and help with summaries and table formatting for purposes of this tutorial.

Step 2: Load and Clean the Sales Data

#set file path to where the data is stores
setwd('inputs/js_yoga')

#note: the file path mirrors the path set up in the google drive

#daily sales data
daily_sales <- read.csv("compiled_daily_sales.csv", stringsAsFactors = FALSE)

#product attributes data
product_attributes <- read.csv("js_yoga_products_attributes.csv", stringsAsFactors = FALSE)

Let’s examine the structure of the daily sales data:

summary(daily_sales)
     Date            Units.Sold            Rank              Rating         
 Length:11264       Length:11264       Length:11264       Length:11264      
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
   Reviews           product_id       
 Length:11264       Length:11264      
 Class :character   Class :character  
 Mode  :character   Mode  :character  

We see over 11,000 records, but duplicates are likely due to the manual file creation process. So let’s remove any possible duplicate combinations of product_id and date.

daily_sales <- daily_sales %>%
  distinct(product_id, Date, .keep_all = TRUE)

After removing duplicates, we’re left with 10,824 unique records. Here is a snapshot of the data:

head(daily_sales)
        Date Units.Sold Rank Rating Reviews product_id
1 2025-02-17        391  728   N.A.    N.A. B08JGTW5TT
2 2025-02-18        344  951   N.A.    N.A. B08JGTW5TT
3 2025-02-19        324 1072   N.A.    N.A. B08JGTW5TT
4 2025-02-20        311 1166   N.A.    N.A. B08JGTW5TT
5 2025-02-21        312 1154   N.A.    N.A. B08JGTW5TT
6 2025-02-22        298 1262    4.3   11533 B08JGTW5TT

Step 3: Load and Clean Product Attribute Data

Now let’s do the same thing for the products attributes data.

#remove duplicates
product_attributes <- product_attributes %>%
  distinct(ASIN, .keep_all = TRUE)

glance(product_attributes)
# A tibble: 1 × 4
   nrow  ncol complete.obs na.fraction
  <int> <int>        <int>       <dbl>
1   439    21          380     0.00640

Now we have 439 unique product records with 21 attributes, including price, brand, reviews, dimensions, and more. Here is a snaphot:

kableExtra::kable(head(product_attributes)) %>% 
  kable_styling() %>% 
  column_spec(2, width = "150px", extra_css = "white-space: nowrap; overflow: hidden; text-overflow: ellipsis;")
ASIN Product.Name Brand Price Monthly.Units.Sold Daily.Units.Sold Monthly.Revenue Date.First.Available Net.Revenue Star.Rating Reviews Amazon.Fees BSR LQS Seller.Type Sellers Category Product.Tier Dimensions Weight Link
B0C7J96PK2 Knee Pad Cushion - 0.6 inch (15mm) Thick Exercise Knee Pad for Pain Free Yoga - Perfect Companion for Home Workout - Easy on the Knees, Elbows, Wrists and Back (Does Not Include Yoga Mat) Kinesis Yoga $13.99 509 15 $7,121 06/07/2023 $7.66 4.7 226 $6.33 18,144 7 FBA 1 Sports & Outdoors Small standard 4.45 x 10.2 x 4.41 in 0.14 kg https://www.amazon.com/dp/B0C7J96PK2?th=1&psc=1
B0CDLP9SQ3 Incense Holder, Wooden Incense Holder for Sticks with Glass Ash Catcher, Anti-ash Incense Burner with 20 Incense Sticks and a Ash Brush for Meditation Yoga Spa Room Decor CEREMONY $9.90 12,894 390 $127,651 08/02/2023 $3.69 4.6 2,698 $6.26 1,958 7 FBA 2 Home & Kitchen Large standard 3.937 x 9.803 x 3.504 in 0.683 lb https://www.amazon.com/dp/B0CDLP9SQ3?th=1&psc=1
B08C1K6JBX merka Yoga Cards, 50 Yoga Deck Flashcards with Asana Poses for Class Sequencing and Practice, Sanskrit and Cues for Beginners, Teachers, Women Workout Fitness Ideal Gift merka $21.98 345 11 $7,583 06/28/2020 $13.79 4.4 806 $8.19 27,413 7 FBA 1 Toys & Games Large standard 4.37 x 2.6 x 0.87 in 4.16 oz https://www.amazon.com/dp/B08C1K6JBX?th=1&psc=1
B01LP0U5X0 Amazon Basics 1/2 Inch Extra Thick Exercise Yoga Mat with Carrying Strap Amazon Basics $17.80 39,280 1,380 $699,184 08/09/2017 $8.27 4.6 102,031 $9.53 10 6 AMZ 3 Sports & Outdoors Large standard 7.6 x 25.197 x 7.5 in 1 kg https://www.amazon.com/dp/B01LP0U5X0?th=1&psc=1
B0CGY6G5PY Palace Learning Yoga Pose Exercise Cards - Yoga Positions and Workout Fitness Cards with Over 50 Poses - 2.5' x 3.5' palace learning $7.97 102 3 $812.94 08/30/2023 $2.87 4.4 80 $5.10 69,410 4 FBA 1 Sports & Outdoors Large standard 3.62 x 2.6 x 0.91 in 0.1 kg https://www.amazon.com/dp/B0CGY6G5PY?th=1&psc=1
B09Y1QQB26 WLLWOO WLLWOO Yoga Bags for Women with Yoga Mats Bags Carrier Carryall Canvas Tote for Pilates Shoulder for Travel Office Beach Workout WLLWOO WLLWOO $16.99 1,525 57 $25,910 05/31/2022 $5.01 4.6 318 $11.98 4,189 7 FBA 2 Sports & Outdoors Large standard 15 x 5 x 14 in 9.17 oz https://www.amazon.com/dp/B09Y1QQB26?th=1&psc=1

Step 4: Joining the Datasets in R

In Excel, you’d likely use a VLOOKUP to merge these. In R, we use left_join().

This tutorial is not meant to be a comprehensive break down of all the joins functions in R. For further exploration check out r4ds.1

js_daily_sales <- daily_sales %>%
  left_join(product_attributes, by = c("product_id" = "ASIN"))

#note: in this case we are joining the product_id 
#from the daily sales data with the ASIN 
#from the product attributes data

This creates a comprehensive data frame where each sales record is enriched with relevant product attributes.

#the broom:: package aids with more elegant summaries
broom::glance(js_daily_sales)
# A tibble: 1 × 4
   nrow  ncol complete.obs na.fraction
  <int> <int>        <int>       <dbl>
1 10824    26        10120     0.00250

Now we can start the demand planning exploratory analysis! Here are the first six records of the final data frame:

#using kableExtra for better formatting fot html docs
kableExtra::kable(head(js_daily_sales)) %>% 
  kable_styling() %>% 
  column_spec(c(1,7), width = "150px", extra_css = "white-space: nowrap; overflow: hidden; text-overflow: ellipsis;")
Date Units.Sold Rank Rating Reviews.x product_id Product.Name Brand Price Monthly.Units.Sold Daily.Units.Sold Monthly.Revenue Date.First.Available Net.Revenue Star.Rating Reviews.y Amazon.Fees BSR LQS Seller.Type Sellers Category Product.Tier Dimensions Weight Link
2025-02-17 391 728 N.A. N.A. B08JGTW5TT Sarin Mathews Womens Yoga Sweatpants Wide Leg High Waisted Lounge Pajamas Pants Comfy Workout Loose Pants with Pockets Sarin Mathews $19.99 9,146 323 $182,829 10/22/2020 $12.94 4.3 11,841 $7.05 1,878 7 FBA 1 Clothing, Shoes & Jewelry Small standard 9.65 x 10.91 x 2.56 in 1.76 oz https://www.amazon.com/dp/B08JGTW5TT?th=1&psc=1
2025-02-18 344 951 N.A. N.A. B08JGTW5TT Sarin Mathews Womens Yoga Sweatpants Wide Leg High Waisted Lounge Pajamas Pants Comfy Workout Loose Pants with Pockets Sarin Mathews $19.99 9,146 323 $182,829 10/22/2020 $12.94 4.3 11,841 $7.05 1,878 7 FBA 1 Clothing, Shoes & Jewelry Small standard 9.65 x 10.91 x 2.56 in 1.76 oz https://www.amazon.com/dp/B08JGTW5TT?th=1&psc=1
2025-02-19 324 1072 N.A. N.A. B08JGTW5TT Sarin Mathews Womens Yoga Sweatpants Wide Leg High Waisted Lounge Pajamas Pants Comfy Workout Loose Pants with Pockets Sarin Mathews $19.99 9,146 323 $182,829 10/22/2020 $12.94 4.3 11,841 $7.05 1,878 7 FBA 1 Clothing, Shoes & Jewelry Small standard 9.65 x 10.91 x 2.56 in 1.76 oz https://www.amazon.com/dp/B08JGTW5TT?th=1&psc=1
2025-02-20 311 1166 N.A. N.A. B08JGTW5TT Sarin Mathews Womens Yoga Sweatpants Wide Leg High Waisted Lounge Pajamas Pants Comfy Workout Loose Pants with Pockets Sarin Mathews $19.99 9,146 323 $182,829 10/22/2020 $12.94 4.3 11,841 $7.05 1,878 7 FBA 1 Clothing, Shoes & Jewelry Small standard 9.65 x 10.91 x 2.56 in 1.76 oz https://www.amazon.com/dp/B08JGTW5TT?th=1&psc=1
2025-02-21 312 1154 N.A. N.A. B08JGTW5TT Sarin Mathews Womens Yoga Sweatpants Wide Leg High Waisted Lounge Pajamas Pants Comfy Workout Loose Pants with Pockets Sarin Mathews $19.99 9,146 323 $182,829 10/22/2020 $12.94 4.3 11,841 $7.05 1,878 7 FBA 1 Clothing, Shoes & Jewelry Small standard 9.65 x 10.91 x 2.56 in 1.76 oz https://www.amazon.com/dp/B08JGTW5TT?th=1&psc=1
2025-02-22 298 1262 4.3 11533 B08JGTW5TT Sarin Mathews Womens Yoga Sweatpants Wide Leg High Waisted Lounge Pajamas Pants Comfy Workout Loose Pants with Pockets Sarin Mathews $19.99 9,146 323 $182,829 10/22/2020 $12.94 4.3 11,841 $7.05 1,878 7 FBA 1 Clothing, Shoes & Jewelry Small standard 9.65 x 10.91 x 2.56 in 1.76 oz https://www.amazon.com/dp/B08JGTW5TT?th=1&psc=1

Benefits of Using Joins in R Over VLOOKUPs in Excel

  • Consistency: Code-based joins are less error-prone than manual lookups.

  • Scalability: Handle larger datasets effortlessly.

  • Transparency: Your transformation steps are fully documented and repeatable.

  • Flexibility: Joins allow for more complex merge logic (e.g., multi-key joins, inner/outer joins).


Conclusion

Transitioning from Excel to R for demand planning doesn’t have to be daunting. In fact, using R for tasks like joining data sets can improve both the quality and efficiency of your analysis.

By mastering tools like left_join() from dplyr, you lay the groundwork for more robust forecasting and reporting workflows.

Stay tuned for the next post in this series, where we’ll explore data reshaping techniques using group_by to translate another common Excel function–the Pivot Table.

No matching items

Footnotes

    • R for Data Science (2e) - Wickham & Grolemund
    ↩︎